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ABSTRACT 



The invention provides a perceptually-based system for 
pattern retrieval and matching, suitable for use in a wide 
variety of information processing applications. An illustra- 
tive embodiment of the system uses a predetermined 
vocabulary comprising one or more dimensions to extract 
color and texture information from an information signal, 
e.g., an image, selected by a user. The system then generates 
a distance measure characterizing the relationship of the 
selected image to another image stored in a database, by 
applying a grammar, comprising a set of predetermined 
rules, to the color and texture information extracted from the 
selected image and corresponding color and texture infor- 
mation associated with the stored image. The vocabulary 
may include dimensions such as overall color, directionality 
and orientation, regularity and placement, color purity, and 
pattern complexity and heaviness. The rules in the grammar 
may include equal pattern, overall appearance, similar 
pattern, and dominant color and general impression, with 
each of the rules expressed as a logical combination of 
values generated for one or more of the dimensions. The 
distance measure may include separate color and texture 
metrics characterizing the similarity of the respective color 
and texture of the two images being compared. The inven- 
tion is also applicable to other types of information signals, 
such as sequences of video frames. 

18 Claims, 4 Drawing Sheets 
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RETRIEVAL AND MATCHING OF COLOR 
PATTERNS BASED ON A PREDETERMINED 
VOCABULARY AND GRAMMAR 

FIELD OF THE INVENTION 

The present invention relates generally to techniques for 
processing images, video and other types of information 
signals, and more particularly to automated systems and 
devices for retrieving, matching and otherwise manipulating 
information signals which include color pattern information. 

BACKGROUND OF THE INVENTION 

Flexible retrieval and manipulation of image databases 
and other types of color pattern databases has become an 
important problem with applications in video editing, photo- 
journalism, art, fashion, cataloging, retailing, interactive 
CAD, geographic data processing, etc. Until recently, 
content-based retrieval (CBR) systems have generally 
required a user to enter key words to search image and video 
databases. Unfortunately, this approach often does not work 
well, since different people describe what they see or what 
they search for in different ways, and even the same person 
might describe the same image differently depending on the 
context in which it will be used. 

One of the earliest CBR systems, known as ART 
MUSEUM and described in K. Hirata and T. Katzo, "Query 
by visual example," Proc. of 3 rd InL Conf. on Extending 
Database Technology, performs retrieval entirely based on 
edge features. A commercial content-based image search 
engine with profound effects on later systems was QBIC, 
described in W. Niblack et al. "The QBIC project: Quering 
images by content using color, texture and shape,** Proc. 
SPIE Storage and Retrieval for Image and Video Data Bases, 
February, 1994. As color representation, this system uses a 
k-element histogram and average of (R,G,B), (Y,i,q), and 
(L>a,b) coordinates, whereas for the description of texture it 
implements Tamura's feature set, as described in H. Tamura 
et al., "Textural features corresponding to visual 
perception," IEEE Transactions on Systems, Man and 
Cybernetics, VbL 8, pp. 460-473, 1982. 

In a similar fashion, color, texture and shape are supported 
as a set of interactive tools for browsing and searching 
images in the Photobook system developed at the MIT 
Media Lab, as described in A Pentland et al., "Photobook: 
Content-based manipulation of image databases," Interna- 
tional Journal of Computer Vision, 1996. In addition to 
providing these elementary features, systems such as 
VisualSeek, described in J. R. Smith and S. Chang, "Visu- 
alSeek: A fully automated content-based query system,** 
Proc. ACM Multimedia 96, 1996, Netra, described in W. Y 
Ma and B. S. Manjunath, "Netra: A toolbox for navigating 
large image databases/* Proc. IEEE Int. Conf. on Image 
Processing, 1997, and Virage, described in A. Gupta, and R. 
Jain, "Visual information retrieval," Communications of the 
ACM, Vol. 40, No. 5, 1997, each support queries based on 
spatial relationships and color layout. Moreover, in the 
above-noted Virage system, the user can select a combina- 
tion of implemented features by adjusting the weights 
according to his or her own "perception." This paradigm is 
also supported in Retrieval Ware search engine described in 
J. Dowe, "Content based retrieval in multimedia imaging," 
Proc. SPIE Storage and Retrieval for Image and Video 
Databases, 1993, 

A different approach to similarity modeling is proposed in 
the MARS system, described in Y. Rui et al., "Content-based 
image retrieval with relevance feed-back in Mars " Proc. 
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IEEE Conf. on Image Processing, 1997, where the main 
focus is not in finding a best representation, but rather on the 
relevance feedback that will dynamically adapt multiple 
visual features to different applications and different users. 

5 Hence, although great progress has been made, none of the 
existing search engines offers a complete solution to the 
general image retrieval problem, and there remain signifi- 
cant drawbacks with the existing techniques which prevent 
their use in many important practical applications. 

10 These drawbacks can be attributed to a very limited 
understanding of color patterns compared to other visual 
phenomena such as color, contrast or even gray-level tex- 
tures. For example, the basic dimensions of color patterns 
have not yet been adequately identified, a standardized and 

15 effective set of features for addressing their important char- 
acteristics does not exist, nor are there rules defining how 
these features are to be combined. Previous investigations in 
this field have concentrated mainly on gray-level natural 
textures, e.g., as described in the above -cited H. Tamura et 

20 al. reference, and in A. R. Rao and G. L. Lohse, "Towards 
a texture naming system: Identifying relevant dimensions of 
texture," Vision Res., Vol. 36, No. 11, pp. 1649-1669, 1996. 
For example, the Rao and Lohse reference focused on how 
people classify textures in meaningful, hierarchically- 

25 structured categories, identifying relevant features used in 
the perception of gray-level textures. However, these 
approaches fail to address the above-noted color pattern 
problem, and a need remains for an effective framework for 
analyzing color patterns. 

30 

SUMMARY OF THE INVENTION 

The invention provides a perceptually-based system for 
pattern retrieval and matching, suitable for use in a wide 

35 variety of information processing applications. Hie system 
is based in part on a vocabulary, i.e., a set of perceptual 
criteria used in comparison between color patterns associ- 
ated with information signals, and a grammar, i.e., a set of 
rules governing the use of these criteria in similarity judg- 

^ ment. The system utilizes the vocabulary to extract percep- 
tual features of patterns from images or other types of 
information signals, and then performs comparisons 
between the patterns using the grammar rules. The invention 
also provides new color and texture distance metrics that 

45 correlate well with human performance in judging pattern 
similarity. 

An illustrative embodiment of a perceptually-based sys- 
tem in accordance with the invention uses a predetermined 
vocabulary comprising one or more dimensions to extract 

50 color and texture information from an information signal, 
e.g., an image, selected by a user. The system then generates 
a distance measure characterizing the relationship of the 
selected image to another image stored in a database, by 
applying a grammar, comprising a set of predetermined 

55 rules, to the color and texture information extracted from the 
selected image and corresponding color and texture infor- 
mation associated with the stored image. For example, the 
system may receive the selected image in the form of an 
input image A submitted in conjunction with a query from 

60 the user. The system then measures dimensions DIM/A) 
from the vocabulary, for i=l, . . . , N, and for each image B 
from an image database, applies rules R f - from the grammar 
to obtain corresponding distance measures dist/A, B), 
where dist^A, B) is the distance between the images A and 

65 B according to the rule i. 

In accordance with the invention, the vocabulary may 
include dimensions such as overall color, directionality and 
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orientation, regularity and placement, color purity, and pat- Sage Publications, London, 1978. MDS is designed to 

tern complexity and heaviness. The rules in the grammar analyze distance-like data called similarity data; that is, data 

may include equal pattern, overall appearance, similar indicating the degree of similarity between two items, 

pattern, and dominant color and general impression, with Traditionally, similarity data is obtained via subjective mea- 

each of the rules expressed as a logical combination of 5 surement. It is acquired by asking people to rank similarity 

values generated for one or more of the dimensions. The of pairs of objects, i.e., stimuli, on a scale. The obtained 

distance measure may include separate color and texture similarity value connecting stimulus i to stimulus j is 

metrics characterizing the similarity of the respective color denoted by o,y. Similarity values are arranged in a similarity 

and texture of the two patterns being compared. matrix A, usually by averaging b £J obtained from all mea- 

A major advantage of a pattern retrieval and matching 10 surements. The aim of MDS is to place each stimulus from 

system in accordance with the invention is that it eliminates me in P ul set intc > ai * n-dimensional stimulus space. The 

the need for selecting the visual primitives for image dimensionality n of the space is also determined in the 

retrieval and expecting the user to assign weights to them, as experiment. The points x^[x„ . . . x J representing each 

required in most current systems. Furthermore, the invention stimulus are arranged so that the Euclidean distances d tf 

is suitable for use in a wide variety of pattern domains, 15 between each pair of points in the stimulus space match as 

including art, photography, digital museums, architecture, closely as possible the subjective similarities 6 t> between 

interior design, and fashion. corresponding pairs of stimuli. Types of MDS suitable for 

use in conjunction with the invention include classical MDS 

BRIEF DESCRIPTION OF THE DRAWINGS (CMDS) and weighted MDS (WMDS). Additional details 

m „ „ , . . 20 regarding these and other types of MDS may be found in the 

1 show * a Portion °f pattern retrieval and matching above . dted j. and M ^ referenc e. 

system in accordance with the invention. „_ . . , „ . . ., . ^ . . TT 

„ HCA is described in greater detail in R. Duda and P. Hart, 

FIG. 2 shows a more detailed view of a color represen- «p aUern classification md scene analysis,- John Wiley & 

tation and modeling process implemented m a feature Sons, New York, N.Y., 1973. Given a similarity matrix, HCA 

extraction element in the FIG. 1 system. 25 organizes a set of stimuli int0 similar Therefore, HCA 

FIG. 3 shows a more detailed view of a texture represen- can be used to determine a set of rules and the rule hierarchy 

tation and modeling process implemented in the feature f or judging similarity in pattern matching. This method 

extraction element in the FIG. 1 system. starts from the stimulus set to build a tree. Before the 

FIG. 4 shows an exemplary communication system appli- procedure begins, all stimuli are considered as separate 

cation of the pattern retrieval and matching system of FIG. 30 clusters, hence there are as many clusters as there are ranked 

1. stimuli. The tree is formed by successively joining the most 

FIG. 5 is a flow diagram illustrating the operation of the similar pairs of stimuli into new clusters. At every step, 

pattern retrieval and matching system in the communication either individual stimulus is added to the existing clusters, or 

system of FIG. 4. existing clusters are merged. The grouping continues 

35 until all stimuli are members of a single cluster. The manner 

DETAILED DESCRIPTION OF THE in which the similarity matrix is updated at each stage of the 

INVENTION tree is determined by the joining algorithm. There are many 

™. . . j u i ■ * c possible criteria for deciding how to merge clusters. Some of 

The invention provides a vocabulary, i.e., a set of per- f. . , 4 . , & , B . , , . 

« ■ . j . • M •/ * i the simplest methods use a nearest neighbor technique, 

ceptual criteria used in judging similarity of color patterns, An , ,,f n * * A * • < 4l r * i_ »i_ 

tU „- „ w * • ^,,.1^ „i, ( - „ ctn v _ ' 40 where the first two objects combined are those that have the 

their relative importance and relationships, as well a „ x ,. A , •* A . * iL , , 

grammar, i.e., a hierarchy of rules governing the use of the f m » U ? st d f a ° ce ^ Ano'her commonly used 

vocabulary in similarity judgment. It has bten determined technique is the furthest neighbor technique where the 

*l «u * v ui 4 u a c distance between two clusters is obtained as the distance 

that these attributes are apphcable to a broad range of - lL . , jL 4 . t « ... , 

A ^ c • i - A t ! • , between their furthest points. The centroid method, calcu- 

textures, from simple patterns to complex, high-level visual AK , / v \~ y - . ' . 

4 «. . J, i * j 4S lates the distances between two clusters as the distance 

texture phenomena. The vocabulary and grammar are uti- , . . . . „ , t 

v . ■ ti * i_* j , • i < *i_ * • between their means. Also, since the merging of clusters at 

hzed in a pattern matchmg and retrieval system that, in an . , , , t1 _ ' , . 4 j . j • 

•ii . .. * ■ • c *• each step depends on the distance measure, different dis- 

illustrative embodiment, receives one or more information r r J(ff , ■ * . i r 

. j \ ,. . c tance measures can result in different clustering solutions for 

signals as input, and depending on the type of query, , . iL , „ . 4 , ° TT ~. , . 

& . K r u • a i a u « u u - • the same clustenng method. These and other HCA tech- 
produces a set of choices modeled on human behavior in _ n . , „ ® .i_ L j tx j j 
r „ . !_• -n. . «* c *• m. j 50 mques are described in detail in the above-cited R. Duda and 
pattern matching. The term information signal as used p Ti H f 

herein is intended to include an image, a sequence of video * rc crcncc * . . . . 

frames, or any other type of information signal that may be Clustenng techniques are often used in combination with 

characterized as including a pattern. MDS > to clanf y obtamed dimensions. However, m the 

same way as with the labeling of the dimensions in the MDS 

1.0 Vocabulary and Grammar of Color Patterns 55 algorithm, interpretation of the clusters is usually done 

m , , , , subjectively and strongly depends on the quality of the data. 

The exemplary vocabulary and grammar to be described 

herein have been determined through experimentation, using 1-1 Vocabulary: Most Important Dimensions of 

multidimensional scaling and hierarchical clustering tech- Color Patterns 

niques to interpret the experimental data. Multidimensional 60 The above-noted vocabulary will now be described in 

scaling (MDS) was applied to determine the most important greater detail. Experiments were performed to determine 

dimensions of pattern similarity, while hierarchical cluster subjective impressions of 20 different patterns from interior 

analysis (HCA) was used to understand how people combine design catalogs. There were 28 subjects taking part in the 

these dimensions when comparing color patterns. experiment, each presented with all 190 possible pairs of 

MDS is a well-known set of techniques that uncover the 65 patterns. For each pair, the subjects were asked to rate the 

hidden structures in data, and is described in greater detail degree of overall similarity on a scale rating from 0 for "very 

in J. Kruskal and M. Wish, "Multidimensional scaling," different" to 100 for "very similar." There were no instruc- 
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lions concerning the characteristics on which these similar- 
ity judgments were to be made, since this was what the 
experiment was designed to discover. The order of presen- 
tation for each subject was different and was determined 
through the use of a random number generator. 5 

The first step in the data analysis was to arrange subjects' 
ratings into a similarity matrix A to be an input to a 
two-dimensional and three-dimensional CMDS procedure. 
Also, a WMDS procedure was applied to the set of 28 
individual similarity matrices. WMDS was performed in 10 
two, three, four, five and six dimensions. The WMDS error 
for the two-dimensional solution was 0.31, indicating that a 
higher-dimensional solution was necessary, i.e., that the 
error was still substantial. The WMDS errors for the three-, 
four-, five- and six-dimensional configurations were 0.26, 15 
0.20, 0.18 and 0.16, respectively. The analysis was Rot 
extended beyond six dimensions since further increases did 
not result in a noticeable decrease of the error. 

The two-dimensional CMDS procedure indicated that the 
important dimensions were: 1) presence/absence of a domi- 2 o 
nant color, also referred to herein as "overall color," and 2) 
color purity. It is interesting that both dimensions are purely 
color based, indicating that, at the coarsest level of 
judgment, people primarily use color to judge similarity. As 
will be seen below, these dimensions remained in all solu- 25 
tions. Moreover, the two-dimensional configuration strongly 
resembles one of the perpendicular projections in the three-, 
four- and five-dimensional solutions. The same holds for all 
three dimensions from the three-dimensional solution, indi- 
cating that these features could be the most general in human 30 
perception. For both CMDS and WMDS, the same three 
dimensions emerged from the three-dimensional configura- 
tions: 1) overall color, 2) color purity, and 3) regularity and 
placement. The four-dimensional WMDS solution revealed 
the following dimensions: 1) overall color, 2) color purity, 3) 35 
regularity and placement, and 4) directionality. The five- 
dimensional WMDS solution came with the same four 
dominant characteristics with the addition of a dimension 
that is referred to herein as "pattern heaviness." This fifth 
dimension did not improve the goodness-of-fit significantly, 40 
since it changed the WMDS error from 0.20 (for four 
dimensions) to 0.18 (for five dimensions). Hence, as a result 
of the above-described experiment, the following five 
important similarity criteria were determined: 

DIMENSION 1 — overall color, which can be described in 4s 
terms of the presence/absence of a dominant color. At the 
negative end of this dimension are patterns with an overall 
impression of a single dominant color. This impression is 
created mostly because the percentage of one color is truly 
dominant. However, a multicolored image can also create an 50 
impression of dominant color. This happens when all the 
colors within the multicolored image are similar, having 
similar hues but different intensities or saturation. At the 
positive end of this dimension are patterns where no single 
color is perceived as dominant. 55 

DIMENSION 2 — directionality and orientation. This 
dimension represents a dominant orientation in the edge 
distribution, or a dominant direction in the repetition of the 
structural element. The lowest values along this dimension 
have patterns with a single dominant orientation, such as 60 
stripes and then checkers. Midvalues are assigned to patterns 
with a noticeable but not dominant orientation, followed by 
the patterns where a repetition of the structural element is 
performed along two directions. Finally, completely nonori- 
ented patterns and patterns with uniform distribution of 65 
edges or nondirectional placement of the structural element 
are at the positive end of this dimension. 



DIMENSION 3 — regularity and placement. This dimen- 
sion describes the regularity in the placement of the struc- 
tural element, its repetition and uniformity. At the negative 
end of this dimension are regular, uniform and repetitive 
patterns (with repetition completely determined by a certain 
set of placement rules), whereas at the opposite end are 
nonrepetitive or nonuniform patterns. 

DIMENSION A — color purity. This dimension divides 
patterns according to the degree of their colorfulness. At the 
negative end are pale patterns, patterns with unsaturated 
overtones, and patterns with dominant "sand/' or "earthy" 
colors. At the positive end are patterns with very saturated 
and very pure colors. Hence, this dimension is also referred 
to as overall chroma or overall saturation within an image. 

DIMENSION 5 — pattern complexity and heaviness. This 
dimension showed only in the last, five -dimensional con- 
figuration. Also, as will be shown below, it is not used in 
judging similarity until the very last level of comparison. For 
that reason it is also referred to herein as "general impres- 
sion." At one end of this dimension are patterns that are 
perceived as "light" and "soft," while at the other end are 
patterns described by subjects as "heavy," "busy" and 
"sharp " 

1.2 Grammar: Rules For Judging Similarity 

A grammar, i.e., a set of rules governing use of the 
above-described dimensions, was then determined. HCA 
was used to order groups of patterns according to the .degree 
of similarity, as perceived by subjects, and to derive a list of 
similarity rules and the sequence of their application. For 
example, it was observed that the very first clusters were 
composed of pairs of equal patterns. These were followed by 
the clusters of patterns with similar color and dominant 
orientation. The HCA analysis led to the following rules: 

RULE 1 — equal pattern. Regardless of color, two textures 
with exactly the same pattern are always judged to be the 
most similar. Hence, this rule uses Dimension 2 
(directionality) and Dimension 3 (pattern regularity and 
placement). 

RULE 2 — overall appearance. Rule 2 uses the combina- 
tion of Dimension 1 (dominant color) and Dimension 2 
(directionality). Two patterns that have similar values in 
both dimensions are also perceived as similar. 

RULE 3 — similar pattern. Rule 3 concerns either Dimen- 
sion 2 (directionality) or Dimension 3 (pattern regularity and 
placement). Hence, two patterns which are dominant along 
the same direction(s) are seen as similar, regardless of their 
color. In the same manner, patterns with the same placement 
or repetition of the structural element are seen as similar, 
even if the structural element is not exactly the same. 

RULE A — dominant color. Two multicolored patterns are 
perceived as similar if they possess the same color distri- 
butions regardless of their content, directionality, placement 
or repetition of a structural element. This also holds for 
patterns that have the same dominant or overall color. 
Hence, this rule involves only Dimension 1 (dominant 
color), 

RULE 5 — general impression. Rule 5 concerns Dimen- 
sions 4 and 5, and divides patterns into "dim", "smooth", 
"earthy", "romantic" or "pale" patterns (at one end of the 
corresponding dimension) as opposed to "bold", "bright", 
"strong", "pure", "sharp", "abstract" or "heavy" patterns (at 
the opposite end). This rule represents the complex combi- 
nation of color, contrast, saturation and spatial frequency, 
and therefore applies to patterns at the highest, abstract level 
of understanding. 
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The above set of rules represents an illustrative embodi- 
ment of a basic grammar of pattern matching in accordance 
with the invention. It should be noted that, in a given 
application, each rule can be expressed as a logical 
expression, e.g., a logical combination, using operators such 5 
as OR, AND, XOR, NOT, etc., of the pattern values along 
the dimensions involved in the rule. For example, consider 
a cluster composed of Patterns X and Y that have similar 
overall color and dominant orientation. The values associ- 
ated with Patterns X and Y along both Dimensions 1 and 2 10 
are very close. Consequently, X and Y are perceived as 
similar according to the Rule 2, which may be expressed in 
the following way: 

(DIMjpO similar to DIM^) AND (DIM 2 (X) similar to 15 
DIM 2 00). 

Of course, numerous other logical expressions involving the 
values of particular patterns along a given set of dimensions 
may be generated in accordance with the invention. 

20 

2.0 Overview of the System 

An illustrative embodiment of an exemplary pattern 
retrieval and matching system in accordance with the inven- 
tion will now be described. The system utilizes the above- 2S 
described basic vocabulary V of color patterns consisting of 
Dimensions 1 to 5: V^DIM^ . . . , DIM 5 }, and the grammer 
G, i.e., the rules governing the use of the dimensions from 
the vocabulary V: G={R l9 R ? , R 3 , R 4 , R 5 }. The illustrative 
embodiment of the system will, given an input image A and 30 
a query Q: measure the dimensions DIMXA) from the 
vocabulary, for i-1, . . . ,5, and for each image B from an 
image database, apply rules R 2 through R 5 from G and 
obtain corresponding distance measures dist 1 (A, B), . . . , 
dist 5 (A, B), where dist/A, B) is the distance between the 35 
images A and B according to the rule i. 

FIG. 1 shows a block diagram of a pattern retrieval and 
matching system 10. The system 10 includes a feature 
extraction component 12, which measures the dimensions 
from vocabulary V, and a similarity measurement compo- 40 
nent 14, in which similar patterns are found using the rules 
from the grammar G. The feature extraction component 12 
is designed to extract Dimensions 1 to 4 of pattern similarity. 
Dimension 5 (pattern complexity and heaviness) is not 
implemented in this illustrative embodiment, since experi- 45 
ments have shown that people generally use this criterion 
only at a higher level of judgment, e.g., while comparing 
groups of textures. The similarity measurement component 
14 in this embodiment performs a judgment of similarity 
according to Rules 1, 2, 3 and 4 from G. Rule 5 is not 50 
supported in the illustrative embodiment, since it is only 
used in combination with Dimension 5 at a higher level of 
pattern matching, e.g., subdividing a group of patterns into 
romantic, abstract, geometric, bold, etc. 

It is important to note that the feature extraction compo- 55 
nent 12 is developed in accordance with a number of 
assumptions derived from psychophysical properties of the 
human visual system and conclusions extracted from the 
above-noted experiment. For example, it is assumed that the 
overall perception of color patterns is formed through the 60 
interaction of luminance component L, chrominance com- 
ponent C and achromatic pattern component AP. The lumi- 
nance and chrominance components approximate signal 
representation in the early visual cortical areas while the 
achromatic pattern component approximates signal repre- 65 
sentation formed at higher processing levels, as described in 
T. N. Cornsweet, "Visual perception," Academic Press, 
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Orlando, 1970. Therefore, the feature extraction component 
12 simulates a similar mechanism, i.e., it decomposes an 
image map into luminance and chrominance components in 
the initial stages, and models pattern information later, as 
will be described in detail below. 

As in the human visual system, a first approximation is 
that each of these components is processed through separate 
pathways. While luminance and chrominance components 
are used for the extraction of color-based information, the 
achromatic pattern component is used for the extraction of 
purely texture-based information. However, one can be 
more precise by accounting for residual interactions along 
the pathways, as described in R. L. DeValois and K. K. 
DeValois, "Spatial Vision," New York: Oxford University 
Press, 1990. The invention accomplishes this by extracting 
the achromatic pattern component from the color 
distribution, instead of using the luminance signal as in 
previous models. Moreover, as will be described below, the 
discrete color distribution is estimated through the use of a 
specially-designed perceptual codebook allowing the inter- 
action between the luminance and chrominance compo- 
nents. 

The feature extraction component 12 extracts features by 
combining the following three major domains: a) a nonori- 
ented luminance domain represented by the luminance com- 
ponent of an image, b) an oriented luminance domain 
represented by the achromatic pattern map, and c) a non- 
oriented color domain represented by the chrominance com- 
ponent. The first two domains are essentially "color blind," 
whereas the third domain caries only the chromatic infor- 
mation. Additional details regarding these domains can be 
found in, e.g., M. S. Livingstone and D. H. Hubel, "Segre- 
gation of form, color, movement and depth: Anatomy, physi- 
ology and perception," Science, Vol. 240, pp. 740-749, 
1988. The domains have been experimentally verified in 
perceptual computational models for segregation of color 
textures, as described in T. V. Papathomas et al., "A human 
vision based computational model for chromatic texture 
segregation," IEEE Transactions on Systems, Man and 
Cybernetics — Part B: Cybernetics, Vol 27, No. 3, June 
1997. In accordance with the invention, purely color-based 
dimensions (1 and 4) are extracted in the nonoriented 
domains and are measured using the color feature vector. 
Texture-based dimensions (2 and 3) are extracted in the 
oriented luminance domain, through the scale-orientation 
processing of the achromatic pattern map. 

The feature extraction component 12 as shown in FIG. 1 
includes processing blocks 20, 22, 24, 26 and 28. Image 
decomposition block 20 transforms an input image into the 
Lab color space and decomposes it into luminance L and 
chrominance C=(a,b) components. Estimation of color dis- 
tribution block 22 uses both L and C maps for color 
distribution estimation and extraction of color features, i.e., 
performs feature extraction along the color-based Dimen- 
sions 1 and 4. Pattern map generation block 24 uses color 
features extracted in block 22 to build the achromatic pattern 
map. Texture primitive extraction and estimation blocks 26 
and 28 use the achromatic pattern map to estimate the spatial 
distribution of texture primitives, i.e., to perform feature 
extraction along texture-based Dimensions 2 and 3. 

The similarity measurement component 14 finds similar 
patterns using the rules from the grammar G. The similarity 
measurement component 14 accesses an image database 30, 
and includes a similarity judging block 32. Given an input 
image A, which may be submitted or selected as part of a 
user query Q, for a designated set of the images in the 
database 30, rules R a through R 4 are applied and corre- 
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sponding distance measures are computed. Then, depending 
on the query Q, a set of best matches is found. 

3.0 Feature Extraction Based on Color Information 

The feature extraction based on color information will 
now be described in greater detail with reference to FIG. 2. 
FIG, 2 shows the processing of color information, as dis- 
tinguished from texture information, in the system 10 of 
FIG. 1. Since color representation is used in the FIG. 1 
system both for the extraction of color-related dimensions 
(color features), and for the construction of the achromatic 
pattern map (used later in texture processing), the feature 
extraction component 12 generates a compact, perceptually- 
based color representation. As shown in FIG. 2, this repre- 
sentation is generated and processed using processing blocks 
40, 42, 44 and 46. In block 40, the input image is trans- 
formed into the Lab color space. This block corresponds to 
the image decomposition block 20 of FIG. 1. In block 42, 
which may be viewed as an element of block 22 of FIG. 1, 
a color distribution is determined using a vector 
quantization-based histogram technique which involves 
reading a color codebook. Block 44, which also may be 
viewed as an element of block 22, extracts significant color 
features from the histogram generated in block 42. Block 46, 
which may be viewed as an element of the similarity judging 
block 32, then performs a color distance calculation to 
determine the perceptual similarity between the determined 
color distribution and the corresponding distribution of an 
image from the database 30. 

3.1 Image Conversion 

The conversion of the input image from RGB to Lab color 
space in block 40 of FIG. 2 will now be described in greater 
detail. An important decision to be made in deriving a color 
feature representation is which color space to use. In order 
to produce a system that performs in accordance with human 
perception, a representation based on human color matching 
may be used. CIE Lab is such a color space, and is described 
in G. Wyszecki and W. S. Stiles, "Color science: Concepts 
and methods, quantitative data and formulae," John Wiley 
and Sons, New York, 1982. The Lab color space was 
designed so that inter-color distances computed using the 
L 2 -norm correspond to subjective color matching data. This 
representation is obtained from an RGB representation (or 
any other linear color representation such as YIQ, YUV, etc.) 
by first linearizing the input data, i.e., removing gamma 
correction. Next, the data is transformed into the XYZ color 
space using a linear operator. In the XYZ space, the data is 
normalized with respect to the illumination white point, and 
then converted to the Lab representation via a nonlinear 
transform. Additional details on this conversion process and 
the design of the Lab color space may be found in the 
above-cited G. Wyszecki and W. S. Stiles reference. 

One potential difficulty with this approach is that for most 
images, the white point is unknown. This problem is avoided 
in the illustrative embodiment by using exclusively the D65 
white point, which corresponds "outdoor daylight" illumi- 
nation. As long as all of the images are taken under the same 
lighting conditions, this is not a problem. However, its use 
for images taken under other lighting conditions can cause 
some shift in the estimated color distribution. In general, 
these shifts are relatively small and the dominant color 
representation, to be described below, appears to be able to 
accommodate the inaccuracies introduced by the fixed white 
point assumption. It should be noted that images taken under 
strongly colored lighting will generally not be represented 
correctly. 
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After determining a perceptually meaningful color repre- 
sentation for the L 2 distance metric, the "next step is to 
estimate the color distribution in the input image by com- 
puting a histogram of the input color data. This requires 

5 specifying a set of bin centers and decision boundaries. 
Since linear color spaces (such as RGB) can be approxi- * 
mated by 3D cubes, bin centers can be computed by per- 
forming separable, equi fetant dis cjejizaii ons along each of 
the coordinate axes. UnfortunatelyTby going to the nonlinear ^ 

Q Lab color space, the volume of all possible colors distorts 
from a cube to an irregular cone. Consequently, there is no 
simple discretization that can be applied to this volume. 

3.2 Histogram Design 

To estimate color distributions in the Lab space, for the 

15 volume which represents valid colors, the set of bin centers 
and decision boundaries which minimize some error crite- 
rion are determined. In the Lab color system, L 2 -norm 
corresponds to perceptual similarity, thus representing the 
optimal distance metric for that space. Therefore, to obtain 

20 an optimal set of bin centers and decision boundaries, one 
attempts to find Lab coordinates of N bin centers so that the 
overall mean-square classification error is minimized. Since 
this is the underlying problem in vector quantization (VQ), 
the LGB vector quantization algorithm, described in A. 

2S Gersho and R. M. Gray, "Vector quantization and signal 
processing," Kluwer Academic Publishers, Boston, 1992, 
may be used to obtain a set of codebooks which optimally 
represent the valid colors in the Lab space. 

In any VQ design, the training data can have a large effect 

30 on the final result. A commonly used VQ design approach 
selects training images which are: a) either representative of 
a given problem so the codebook is optimally designed for 
that particular application, or b) span enough of the input 
space, so the resulting codebook can be used in many 

35 different applications. The following problem occurs with 
both of these approaches: in order to obtain an accurate 
estimation for the distribution of all possible colors, a large, 
number of training images is required. This results in a 
computationally expensive and possibly intractable design 

40 problem. To overcome this problem, the present invention 
takes a different approach. Since we need to deal with an 
arbitrary input, we will assume that every valid color is 
equi-probable. Hence, a synthetic set of training data can be 
generated by uniformly quantizing the XYZ space. This data 

45 was transformed into the Lab space and then used as input 
to a standard VQ design algorithm. This resulted in a set of 
codebooks ranging in size from 16 to 512 colors. 

A potential drawback of these codebooks is that they are 
designed as a global representation of the entire color space 

50 and consequently, there is no structure to the bin centers. In 
an embodiment of the invention which allows a user to 
interact with the retrieval process, it is desirable for the color 
representation to provide manipulation with colors in a 
"human-friendly" manner. To simulate human performance 

ss in color perception, a certain amount of structure on the 
relationships between the L, a, and b components must be 
introduced. One possible way to accomplish this is by 
separating the luminance L, from the. chrominance (a,b) 
components. In the illustrative embodiment, a one- 

60 dimensional quantization is first applied on luminance val- 
ues of the training data, e.g., using a Lloyd-Max quantizer. 
Then, after partitioning the training data into slices of similar 
luminance, a separate chrominance codebook is designed for 
each slice by applying the LBG algorithm to the appropriate 

65 (a^>) components. 

This color representation better mimics human perception 
and allows the formulation of functional queries such as 
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looking for "same but lighter color," "paler/* "contrasting," closest matches (in an L 2 sense) from the extracted dominant 

etc. For example, the formulation of a query vector to search colors. Finally, the percentage of each dominant color was 

for a "lighter" color can be accomplished through the calculated and the color feature vectors were obtained as 
following steps: 1) extract the luminance L Q and the (a^, b Q ) 

pair for the query color, 2) find the codebook for a higher 5 /c-{(w^lMWft^o,i]} 

luminance level L>Lo, 3) in this codebook, find the cell , . . , . , , 

which corresponds tothe (a,b) entry which is the closest to where V 15 the in the codebook, P/ is the corresponding 

(a e , bo) in the L 2 sense, and 4) retrieve all images having Percentage and N is the number of dominant colors in the 

(L^b) as a dominant color. Moreover, starting from the ma p ; similar representation i has been i successful^ 

relationship between L, a, and b values for a particular color, 10 ^V" 8 * retneval > 35 descnbed m W. Y. Ma et al 

and its hue H and saturation S, Tools for te xture/color base search of images, Proc. of 

SPIE, Vol. 3016, 1997. 

b The above -described feature extraction of the present 

// = arctaa- f S ~ Vo 2 + b 2 , invention has several advantages. For example, it provides 

s an optimal representation of the original color content by 
minimizing the MSE introduced when using a small number 

similar procedures can be applied to satisfy user queries 0 f coio^. xh en> by exploiting the fact that the human eye 

such as "paler color," "bolder color," "contrasting color," cannot perceive a large number of colors at the same time, 

etc. Finally, in applications in which the search is performed nor j s i t aD i e to distinguish close colors well, a very compact 

between different databases or when the query image is feature representation is used. This greatly reduces the size 

supplied by the user, separation of luminance and chromi- G f tne features needed for storage and indexing, 

nance allows for elimination of the unequal luminance Furthermore, because of the codebook used, this represen- 

condition. Since the chrominance components contain the tation facilitates queries containing an overall impression of 

information about the type of color regardless of the inten- patterns expressed in a natural way, such as "find me all 

sity value, color features can be extracted only in the blue-yellow fabrics," "find me the same color, but a bit 

chrominance domain C(ij)={a(i,j),b(i,j)}, for the corre- lighter," etc. Finally, in addition to storing the values of the 

sponding luminance level, thus allowing for comparison dominant colors and their percentages, the system also 

between images of different quality. stores me actual number of dominant colors. This informa- 

. „ , „ . tion is useful in addressing the more complex dimensions of 

3.3 Color Feature Extraction ^ paUera similaritieSj Q ^ searching for simple and single 

(Color histogram representations based on color code- colored patterns, versus heavy, multicolored ones, 

books have been widely used as a feature vector in image . 

segmentation and retrieval, as described in, e.g., M. loka, "A 3,4 Color Metnc 

method of defining the similarity of images on the basis of qqIot features described above, represented as color 

color information," Technical Report RT-0030, IBM 35 and area pairSj tne definition of a color metric that 

Research, Tokyo Research Laboratory, November 1989, and closely mat ches human perception. The idea is that the 

M. Swain and D. Ballard, "Color indexing," International similarity between two images in terms of color composition 

Journal of Computer Vision, Vol. 7, No. 1, 1991. Although should be measured by a combination of color and area 

good results have been reported, a feature set based solely on differences. Given two images, a query image A and a target 

the image histogram may not provide a reliable representa- 40 i^ge B, with N A and N fl dominant colors, and feature 

tion for pattern matching and retrieval. This is due to the fact vectors f c (A)={(i a ,pJ|aG[l,N j4 ]} and ^(By{(i b , Vb )^l, 

that most patterns are perceived as combinations of a few N j} ( respectively, the similarity between these two images 

dominant colors. For that reason, the illustrative embodi- & g^t defined in terms of a single dominant color. Suppose 

ment of the invention utilizes color features and associated mat j fe the dominant color in image A. Then, the similarity 

distance measures comprising the subset of colors which 45 between A and B is measured in terms of that color using the 

best represent an image, augmented by the area percentage minimum of distance measures between the color element (i, 

in which each of these colors occur. p ) ^ me xi of e i eme nts {fepJb^NJ}: 
/ One implementation of the system 10 of FIG. 1 uses a 

'codebook with N=71 colors denoted by ^-{QA, . . . , d{U B)= ^ pJ) ftf pt))t 

C^} where each color C— {L^a^b,-} is a three-dimensional 50 

Lab vector. As the first step in th e fe ature extraction proc e- w h ere 
dure (before his tQgrarn^j dculation j, the input image is 



convolved with a B-spline smoothing kernel. This is done to ph (k, py)) = \p - pb\ + V {L-u? + {a-ab? ±(b-b b ) 2 . 
refine contours of texture primitives and foreground regions, 

while eliminating most of the background noise. The 55 _ , , , , . , . . . 

n 1* 1 1 • a * •* *j ** 1 Once the distance d(i.B) has been calculated, besides its 

B-sphne kernel is used since it provides an optimal re pre- : ; v * J , , ' , ;? 

scntation of a signal in the L 2 sense, hence minimizing the w f e ako ^ ,f f "S"™"' 10 stare the color value from 

perceptual error, as described in M. Unser et al., "Enlarge- ? f ° r a Pf icul 1 ar c ° lo J \ from rammats d(i,B). We 

ment or reduction of digital images with minimum loss of denote tias 00101 value ^ B ) as: 

information," IEEE Trans. Image Processing, Vol. 4, pp. 60 jtfLflW*dfc5) 
r-247-257, March 1995. Th e second step (af ter the histogram 

of an image is generated) involves extraction of dominant Note that the distance between two color/area pairs is 

\ colors to find colors from the codebook that adequately defined as the sum of the distance in terms of the area 

describe a given texture pattern. This was implemented by percentage and the distance in the Lab color space, both 

^sequentially increasing the number of colors until all colors 65 within the range [0,1]. The above-cited W. Y. Ma et al. 

covering more than 3% of the image area have been reference used a different definition where the overall dis- 

extracted. The remaining pixels were represented with their tance is the product of these two components. That definition 
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has the drawback that when either component distance is consider a pair of textures in which the values in the 

very small the remaining component becomes irrelevant. luminance map are much higher for one of the textures, 

Consider the extreme case, when the color distance between hence the edge amplitudes, and edge distributions are dif- 

two color/area pairs is 0. This is not unusual since the color ferent for the two corresponding images. Moreover, the 

space has been heavily quantized. Then, even if the differ- 5 dominant colors are not close, which makes the classifica- 

ence between the two area percentages is very large, the tionof these two patterns as similar (either using luminance, 

overall distance is 0, yielding a measure that does not match chrominance or color features) extremely difficult. However, 

human perception. The illustrative embodiment of the in the above-described model, the way that luminance and 

invention provides a simple and effective remedy to that chrominance are coupled into a single pattern map guaran- 

problem, namely, it guarantees that both color and area 10 tees that both textures will have identical achromatic pattern 

components contribute to the perception of color similarity. maps, leading to almost identical texture feature vectors. 

Given the distance between two images in terms of one The objective of edge and orientation processing in blocks 

dominant color as defined above, the distance in terms of $\ 9 52 and 54 is to extract information about the pattern 

overall color composition is defined as the sum over all contours from the achromatic pattern map. Instead of apply - 

dominant colors from both images, in the following way: 1) 15 mg a bank of oriented filters, as in previous models, the 

for image A, Va^l,N A ] find k A (i 0 ,B) and the corresponding illustrative embodiment of the present invention computes 

distance d^B), 2) repeat this procedure for all dominant polar edge maps and uses them to extract distribution of 

colors in B, that is, Vb^l^] find k^i t ,B) and d(i fc A), and edges along different directions. This approach makes it 

3) calculate the overall distance as possible to obtain the edge distribution for an arbitrary 

20 orientation with low computational cost. It also introduced 

disi(A, b)= J] B)+ X ^' certain flexibility in the extraction of texture features since, 

og[i^ a j fe[i.yv fl ] if necessary, the orientation selectivity can be enhanced by 

choosing an arbitrary number of orientations. In the illus- 
trative system 10, edge-amplitude and edge-angle maps, 

Other types of distance calculations could also be used to 25 calculated at each ^ point> are used . Edge maps were 

generate a color metric in accordance with the invention. obtained by convolving an input achromatic pattern map 

with the horizontal and vertical derivatives of a Gaussian 

Information" anc * convertm g me result into polar coordinates. The deriva- 
tives of a Gaussian along x and y axes were computed as 

The feature extraction based on texture information will 30 -<A/) -<^ z ) 

now be described in greater detail with reference to FIG. 3. fcte/H^ ' 8/U)*ye ' ** > 

FIG. 3 shows the processing of texture information, as while the derivatives of the achromatic pattern map along x 

distinguished from color information, in the system 10 of and y axes were computed as 

FIG. 1. As shown in FIG. 3, this representation is generated A * , ., D w. „ . * , D w. - 

and processed using processing blocks 50, 51, 52, 54, 56 and 35 

58. In block 50, the achromatic pattern map is generated where * stands for two-dimensional convolution. These 

from the color feature vector, after spatial smoothing to derivatives were then transformed into their polar represen- 

refine texture primitives and remove background noise. This tation as: 
block corresponds to the pattern map generation block 24 of 

FIG. 1. In block 51, which may be viewed as an element of 40 a(U j) ~ yj(AAi* j)) 2 + {A y y, J)) 2 , 
block 26 of FIG. 1, the edge map is built from the achromatic 

pattern map. Block 52 applies a nonlinear mechanism to ^ ;) = tan~' ^ 8(t j)s[-- -] 

suppress nontextured edges. Block 54 performs orientation (^('« 2 2 
processing to extract the distribution of pattern contours 

along different spatial directions. Blocks 52 and 54 may be 45 Texture phenomeDOn & maicd through the perception of 

viewed as elements of block 26 of FIG. 1. Block 56, which ^ along different direc ti ons , over different 

corresponds to block 28 of FIG. 1, computes a scale-spatial scalcs HencCj tQ cstimate the placemcnt ^ organization of 

estimation of texture edge distribution. Block 58, which may texture primitiveSf information about the edge strength at a 

be viewed as an element of the similarity judging block 32, certain ^ [s QQ{ needed; ramerj it ^ Qnly t0 

then performs a texture distance calculation to ^determine the 50 ^ a) whelher m ed g e ejdsts at this int> arjd b) the 

perceptual similarity between the determined texture edge direction of lfae ed Therefore, after me transformation 

distnbution and the corresponding distribution of an image mtQ me lar representation, the amplitude map is nonlin- 

from the database 30. early processed as: 

The achromatic map in block 50 is obtained in the 

following manner: For a given texture, by using the number 55 , j med{A{t, /)) a r 

of its dominant colors N, a gray level range of 0 to 255 is AqU* D = | J med ^- ^ K T • 
discretized into N levels. Then, dominant colors are mapped 
into gray levels according to the following rule: Level 0 is 

assigned to the dominant color with the highest percentage where med (.) represents the median value calculated over a 

of pixels, the next level is assigned to the second dominant 60 5x5 neighborhood. Nonlinear median operation was intro- 

color, etc., until the level 255 has been assigned to a duced to suppress false edges in the presence of stronger 

dominant color with the lowest area percentage. In other ones, and eliminate weak edges introduced by noise. The 

words, the achromatic pattern map models the fact that quantization threshold T is determined as: 

human perception and understanding of form, shape and , — 

orientation is completely unrelated to color. Furthermore, it 65 r ^ -2 °*» 

resolves the problem of secondary interactions between the where fi A and cr 2 ^ are the mean and variance of the edge 

luminance and chrominance pathways. As an example, amplitude, estimated on a set of 300 images. This selection 
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allowed all the major edges to be preserved. After quantizing between these two factors can be implemented using an 

the amplitude map, the discretization of the angle space is exponential function. Thus, the distance between the query 

performed, dividing it into the six bins corresponding to image A and the target image B, with texture feature vectors 

directions 0°, 30°, 60°, 90°, 120° and 150°, respectively. For /^H> w 91 . . . oj*] and ffMpJ* - • - ^« 9tf L 

each direction an amplitude map Ao(ij) is built as: 5 . , . , £ , 

r r -u.a v/ respectively, is defined as: 



fl. Afl(f.J)=l *«*./)) eft 



10 4 j = w M (i, 9j) M s j + w D {i, B^D-J 



where "a" denotes a logic "and" operator and "v" denotes a 

logic "or" operator. The 6, in this example correspond to _ c ~T' *~J g . 

the six directions identified above. ' 6: ^ 
To address the textural behavior at different scales, mean 



1 + e \ ' > 1 ^ 



Scale 4: WS A = .lOWx.lOH, = 224, 30 



and variance of edge density distribution is estimated, by 15 disi{A jj)-VV 

applying overlapping windows of different sizes to the set of ' ij*' 
directional amplitude maps. For a given scale, along a given 
direction, edge density is calculated simply by summing the 

values of the corresponding amplitude map within the At each scale i and direction 6,, the distance function is 

window, and dividing that value by the total number of 20 the weighted sum of two terms: the first M/^ measuring the 

pixels in the window. Four scales were used in the illustra- difference in mean edge density and the second D/\ mea- 

tive embodiment, with the following parameters for the surmg the difference m standard deviation, or regularity. The 

sliding window: weighting factors, w^i,^) and w^i^), are designed such 

that when the difference in standard deviation is small, the 

25 first term is more dominant; as it increases, the second term 

Scale l: ws. = .75Wx.75tf, ^ = 30, becomes dominant, thus matching human perception as 

Scale 2: ws 2 = .40 w x .40//, ,N Z = 56, stated above. , 

The parameters a and Do control the behavior of the 
Scale 3: ws 3 = 2QW x .20H, ;v 3 = 80, weighting factors, where a controls the sharpness of the 

transition, and Do defines the transition point. These two 
parameters were trained in the illustrative embodiment using 
40 images taken from an interior design database, in the 
where WS,. and N,- are window size and number of windows following way. First, 10 images were selected as represen- 
tor scale i, and W and H are the width and height of the input totlves ? f tb ? database - ™ en ' for each representative, 3 
texture. Note that the above approach is scale (zoom) prison ima 8f s were « the J°° st similar, close 

A j 4 . * « tJ . a > , 35 and least similar to the representative. For each representa- 

invariant. In other words, the same pattern at different scales ^ ; . * ^ ^ ;J c 

will have similar feature vectors. ^ 3 ' K Qrdered ^ decreasi simi i ari ty. Thus, seis 

The output of the above-described texture processing { , } Md {c } me d ^ For iven ^ 

block 56 is a texture feature vector of length 48: of p^ 1 ^ (fl|p Do) , lhe rankings of me comparison 

/,=[/i 1 e, a 1 e Vi e2 ° 82 i fh^o^^o^ }if*o*\ 40 images as given by the distance function can be computed. 

Let rank iy (a,Do) represent the ranking of the comparison 
where and a, 6 * stand for mean and standard deviation of image for representative image I f . Ideally, one would like 
texture edges at scale i along the direction 9 ; -. Each feature to achieve 

component may be normalized so that it assumes the mean , , ^ x , w , ...^ „, 

value of 0 and standard deviation of 1 over the whole 45 "-^H V^e[uoi/E[i,3]. 

database. In that way this feature vector essentially models The deviation from ground truth is computed as 

both texture-related dimensions (directionality and 

regularity): the distribution estimates along the different *° 

directions address the dimension of directionality. At any xxar, Do) = ^4(^00), 

particular scale, the mean value can be understood as an 50 

estimation of the overall pattern quality, whereas the stan- where 

dard deviation estimates the uniformity, regularity and 3 

repetitiveness at this scale, thus addressing the dimension of 4(a, Do) = ^ \disr(i it c hj ) - dbi(i it c iinuik}j{arDo) }\. 

pattern regularity. 



4.1 Texture Metric 



55 



The goal of the above-described parameter training is to 

As previously mentioned, at any particular scale, the minimize the function D(a,Do). Many standard optimiza- 

mean values measure the overall edge pattern and the non algorithms can be used to achieve this. For example, 

standard deviations measure the uniformity, regularity and Powell's algorithm, as described in William H. Press et al, 

repetitiveness at this scale. The above-noted experiments 60 "Numerical Recipes in C," 2nd edition, pp. 412—420, 

demonstrated that the perceptual texture similarity between Cambridge University Press, New York, 1992, was used in 

two images is a combination of these two factors in the me illustrative embodiment, and the optimal parameters 

following way: if two textures have very different degrees of derived were a=10 and Do=0.95. 
uniformity they are immediately perceived as different. On 

the other hand, if their degrees of uniformity, regularity and 65 50 Similarity Measurement 

repetitiveness are close, their overall patterns should be As previously noted, the similarity measurement compo- 

further examined to judge similarity. The smooth transition nent 14 in system 10 of FIG. 1 performs similarity mea- 
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surements based on the rules from the above-described 
grammar G. The system was tested on a number of exem- 
plary databases, including a wide variety of different pattern 
images including photographs, interior design, architectural 
surfaces, historic ornaments and oriental carpets. The appli- 
cation of the four rules, Rules 1 to 4, of the grammar G, is 
described in greater detail below. 

APPLYING RULE 1 (equal pattern): Regardless of color, 
two textures with exactly the same pattern are always judged 
to be similar. Hence, this rule concerns the similarity only in 
the domain of texture features, without actual involvement 
of any color-based information. Therefore, this rule is imple- 
mented by comparing texture features only, using the above- 
described texture metric. The same search mechanism sup- 
ports Rule 3 (similar pattern) as well. According to that rule, 
two patterns that are dominant along the same directions are 
seen as similar, regardless of their color. In the same manner, 
textures with the same placement or repetition of the struc- 
tural element are seen as similar, even if the structural 
element is not exactly the same. Hence, the value of the 
distance function in the texture domain reflects either pattern 
identity or pattern similarity. For example, very small dis- 
tances mean that two patterns are exactly the same (implying 
that the rule of identity was used), whereas somewhat larger 
distances imply that the similarity was judged by the less 
rigorous rules of equal directionality or regularity. 

APPLYING RULE 2 (overall appearance): The actual 
implementation of this rule involves comparison of both 
color and texture features. Therefore, the search is first 
performed in the texture domain, using the above-described 
texture features and metrics. A set of selected patterns is then 
subjected to another search, this time in the color domain, 
using the above-described color features and color metrics. 

APPLYING RULE 3 (similar pattern): The same mecha- 
nism as in Applying Rule 1 is used here. 

APPLYING RULE 4 (dominant color): According to the 
rule of dominant color, two patterns are perceived as similar 
if they posses the same color distributions regardless of 
texture quality, texture content, directionality, placement or 
repetition of a structural element. This also holds for patterns 
that have the same dominant or overall color. Hence, this 
rule concerns only similarity in the color domain, and is 
applied by comparing color features only. 

6.0 Query Types and Other Search Examples 

As explained previously, one of the assumptions about the 
model used in the illustrative embodiment is that chromatic 
and achromatic components are processed through mostly 
separate pathways. Hence by separating color representation 
and color metric from texture representation and texture 
metric, the invention provides a system with a significant 
amount of flexibility in terms of manipulation of image 
features. This is an extremely important issue in many 
practical applications since it allows for different types of 
queries. As input into the system the user may be permitted 
to supply: a) a query and b) patterns to begin the search. The 
rules given above model typical human queries, such as: 
"find the same pattern" (Rule 1), "find all patterns with 
similar overall appearance" (Rule 2), "find similar patterns" 
(Rule 3), and "find all patterns of similar color", "find all 
patterns of a given color", and "find patterns that match a 
given pattern" (Rule 4). Moreover, due to the way the color 
codebook of the invention is designed, the system supports 
additional queries such as: "find darker patterns," "find more 
saturated patterns," "find simple patterns," "find multicol- 
ored patterns," "find contrasting patterns," An input pattern 
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provided by the user can be, e.g., supplied by the user, 
selected from a database, given in the form of a sketch, or 
provided by any other suitable technique. If the user has 
color preferences, they can be specified either from the color 

5 codebook, or from another pattern. 

As an example, consider a query in which the user 
provides an input pattern in the form of a sketch. There are 
certain situations when the user is unable to supply an image 
of the pattern he or she is trying to find. Hence, instead of 

io requiring the user to browse through the database manually, 
the system may provide tools for sketching the pattern and 
formulating a query based on the obtained bitmap image. In 
that case, without any lowpass prefiltering, only a texture 
feature vector is computed for the bitmap image and used in 

15 the search. Furthermore, this search mechanism may allow 
the user to specify a desired color, by selecting a color 
i-jL^a.-jbJ from the color codebook. Then, the search is 
performed in two iterations. First, a subset of patterns is 
selected based on color similarity. Color similarity between 

20 the color i and target image B, with the color feature vector 
WBMCWJlb^l-NJ}, is calculated as 

d(i,B) = min D e (iJb), 

Next, within the selected set, a search based on texture 
features is performed to select the best match. A similar 

30 search mechanism is applied for combination query, where 
the desired pattern is taken from one input image and the 
desired color from another image, or in a search where the 
desired pattern is specified by an input image and the desired 
color is selected from the color map. 

35 FIG. 4 shows an exemplary communication system appli- 
cation of the pattern retrieval and matching system 10 of 
FIG. 1. The communication system 100 includes a number 
of user terminals 102-i, i»l, 2, . . . N and a number of servers 
104-i, i=l, 2, . . . M. The user terminals 102-i and servers 

40 104-i communicate over a network 106. The user terminals 
102-i may represent, e.g., desktop, portable or palmtop 
computers, workstations, mainframe or microcomputers, 
television set-top boxes, or any other suitable type of com- 
munication terminal, as well as portions or combinations of 

45 such terminals. 

The servers 104-i may be, e.g., computers, workstations, 
mainframe or microcomputers, etc. or various portions or 
combinations thereof. One or more of the servers 104-i may 
be co-located with one or more of the user terminals 102-i, 

50 or geographically remote from all of the user terminals 
102-i, depending on the specific implementation of the 
system 100. The network 106 may be, e.g., a global com- 
munication network such as the Internet, a wide area 
network, a local area network, a cable, telephone, wireless or 

55 satellite network, as well as portions or combinations of 
these and other networks. Each of the user terminals 102-i 
may include a processor 110 and a memory 112, and each of 
the servers 104-i may include a processor 114 and a memory 
116. The processors 110, 114 and memories 112, 116 may be 

60 configured in a well-known manner to execute stored pro- 
gram instructions to carry out various features of the inven- 
tion as previously described. 

In operation, a user at one of the user terminals 102-i 
enters a query regarding a pattern for which the user desires 

65 to find matching information in a database accessible by one 
or more of the servers 104-i. FIG. 5 is a flow diagram 
illustrating an example of this process as carried out in the 
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communication system of FIG. 4. In step 120, the user 
utilizes a web browser or other suitable program running in 
terminal 102-i to log on to a web page associated with a 
source of pattern information and accessible over the net- 
work 106. The web page may be supported by one or more 5 
of the servers 104-i. The user in step 122 selects from the 
web page a database or set of databases which the user 
would like to search. If the user does not specify a particular 
database, all of the databases associated with the web page 
may be searched. In step 124, the user supplies a query 10 
image on which the search will be based. The query image 
may be an image selected from a catalog accessible through 
the web page, a scanned image supplied by the user, e.g., in 
the form of a sketch or other previously scanned or down- 
loaded image. The user in step 126 defines a query, i.e., 15 
specifies the other parameters of the search, such as the type 
of matching patterns that are of interest, the number of 
matches desired, etc. 

The user then launches the search by, e.g., clicking an 
appropriate button or icon on the web page. The query and 20 
query image are then supplied over the network 106 to an 
appropriate one of the servers 104-i. In this embodiment, it 
is assumed that the system 10 of FIG. 1 is implemented by 
appropriate programming of one or more of the servers 
104-i. The system responds in step 130 by displaying to the 25 
user at terminal 102-i a specified number of the best 
matches. In step 132, the user can continue the process by 
modifying the search, launching another search, e.g., with a 
new query image or set of query parameters, or can exit the 
system. 30 

It should be noted that the particular implementation of 
the communication system 100 will vary depending on the 
specific application. For example, in certain applications, 
such as interior design stores or other facilities, to have the 
user terminals geographically co-located with one or more 35 
of servers. In an Internet-based application, the user termi- 
nals may represent personal computers at the user's homes 
or offices, and the servers may represent, e.g., a server 
cluster at a remote location designed to process a large 
number of user queries received from around the world. 40 
Many other applications are of course possible. 

The invention has been described above in conjunction 
with an illustrative embodiment of a pattern retrieval and 
matching system. However, it should be understood that the 
invention is not limited to use with the particular configu- 45 
rations shown. For example, other embodiments of the 
invention may take into account image content or domain 
specific information in performing image retrieval and 
matching. In addition, the invention can be applied to other 
types of information signals, such as, for example, video 50 
information signals in the form of sequences of video 
frames. Numerous other alternative embodiments within the 
scope of the following claims will be apparent to those 
skilled in the art. 

What is claimed is: 55 

1. A system for processing information signals, the system 
comprising: 

a processor operative to compare a first information signal 
to a second information signal in response to a user 
query, wherein the first information signal is selected 60 
by the user and the second information signal is stored 
in a database associated with the system, wherein the 
processor extracts color and texture information from at 
least the first information signal using a predetermined 
vocabulary comprising one or more dimensions, and 65 
generates a distance measure characterizing the rela- 
tionship of the first information signal to the second 
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information signal by applying a grammar comprising 
a set of predetermined rules to the color and texture 
information extracted from the first information signal 
and corresponding color and texture information asso- 
ciated with the second information signal, wherein the 
processor receives the first information signal in the 
form of an input image A submitted in conjunction with 
a query from the user, and wherein the processor is 
further operative to measure dimensions DIM/A) from 
the vocabulary, for i=l, . . . , N, and for each image B 
from an image database, to apply rules R, from the 
grammar to obtain corresponding distance measures 
dist t (A, B), where dist/A, B) is the distance between 
the images A and B according to the rule i. 

2. The system of claim 1 wherein at least one of the first 
and second information signals comprises an image. 

3. The system of claim 1 wherein at least one of the first 
and second information signals comprises a sequence of 
video frames. 

4. The system of claim 1 wherein the vocabulary com- 
prises one or more of the following dimensions: overall 
color, directionality and orientation, regularity and 
placement, color purity, and pattern complexity and 
heaviness, and the system generates for at least one of the 
first and second information signals a set of values associ- 
ated with one or more of the dimensions. 

5. The system of claim 4 wherein the system applies the 
rules to the set of values. 

6. The system of claim 5 wherein the equal pattern file is 
a function of the dimension of directionality and orientation 
and the dimension of regularity and placement. 

7. The system of claim 5 wherein the overall appearance 
rule is a function of the dimension of overall color and the 
dimension of directionality and orientation. 

8. The system of claim 5 wherein the similar pattern rule 
is a function of at least one of the dimension of directionality 
and orientation and the dimension of regularity and place- 
ment. 

9. The system of claim 5 wherein the dominant color rule 
is a function of the dimension of overall color. 

10. The system of claim 5 wherein the general impression 
rule is a function of the dimension of color purity and the 
dimension of pattern complexity. 

11. The system of claim 5 wherein each rule is expressed 
as a logical combination of one or more of the values 
generated for at least a subset of the dimensions. 

12. The system of claim 1 wherein the processor is further 
operative to extract an achromatic pattern map from the first 
information signal using a color distribution generated from 
the first information signal. 

13. The system of claim 1 wherein the color distribution 
is estimated using a set of color codebooks, with each of the 
color codebooks corresponding to a different luminance 
level of the first information signal. 

14. The system of claim 1 wherein the processor is further 
operative to generate a color metric characterizing the 
similarity of the color information associated with the first 
and second information signals. 

15. The system of claim 1 wherein the processor is further 
operative to generate a texture metric characterizing the 
similarity of the texture information associated with the first 
and second information signals. 

16. A system for processing information signals, the 
system comprising: 

a processor operative to compare a first information signal 
to a second information signal in response to a user 
query, wherein the first information signal is selected 
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by the user and the second information signal is stored 
in a database associated with the system, wherein the 
processor extracts color and texture information from at 
least the first information signal using a predetermined 
vocabulary comprising one or more dimensions, and 5 
generates a distance measure characterizing the rela- 
tionship of the first information signal to the second 
information signal by applying a grammar comprising 
a set of predetermined rules to the color and texture 
information extracted from the first information signal 10 
and corresponding color and texture information asso- 
ciated with the second information signal, wherein the 
set of predetermined rules comprises one or more of the 
following rules: equal pattern, overall appearance, 
similar pattern, dominant color and general impression, 15 
wherein the processor receives the first information 
signal in the form of an input image A submitted in 
conjunction with a query from the user, and wherein the 
processor is, further operative to measure dimensions 
DIMXA) from the vocabulary, for i-1, . . . , N, and for 20 
each image B from an image database, to apply rules R t - 
from the grammar to obtain corresponding distance 
measures dist^A, B), where dist^A, B) is the distance 
between the images A and B according to the rule i. 

17. A system for processing information signals, the 25 
system comprising: 

a processing device having a processor coupled to a 
memory, the processor being operative to compare a 
first information signal to a second information signal 
in response to a user query, wherein the first informa- 30 
tion signal is selected by the user and the second 
information signal is stored in a database associated 
with the system, wherein the processor extracts color 
and texture information from at least the first informa- 
tion signal using a predetermined vocabulary compris- 35 
ing one or more dimensions, and generates a distance 
measure characterizing the relationship of the first 
information signal to the second information signal by 
applying a grammar comprising a set of predetermined 
rules to the color and texture information extracted 40 
from the first information signal and corresponding 
color and texture information associated with the sec- 
ond information signal, wherein the set of predeter- 
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mined rules comprises one or more of the following 
rules: equal pattern, overall appearance, similar pattern, 
dominant color and general impression, wherein the 
processor receives the first information signal in the 
form of an input image A submitted in conjunction with 
a query from the user, and wherein the processor is 
further operative to measure dimensions DIM t {A) from 
the vocabulary, for i-1, .... N, and for each image B 
from an image database, to apply rules R,- from the 
grammar to obtain corresponding distance measures 
dist/A, B), where dist^A, B) is the distance between 
the images A and B according to the rule i. 

18. A system for processing information signals, the 
system comprising: 

a processing device having a processor coupled to a 
memory, the processor being operative to compare a 
first information signal to a second information signal 
in response to a user query, wherein the first informa- 
tion signal is selected by the user and the second 
information signal is stored in a database associated 
with the system, wherein the processor extracts color 
and texture information from at least the first informa- 
tion signal using a predetermined vocabulary compris- 
ing one or more dimensions, and generates a distance 
measure characterizing the relationship of the first 
information signal to the second information signal by 
applying a grammar comprising a set of predetermined 
rules to the color and texture information extracted 
from the first information signal and corresponding 
color and texture information associated with the sec- 
ond information signal, wherein the processor receives 
the first information signal in the form of an input 
image A submitted in conjunction with a query from the 
user, and wherein the processor is further operative to 
measure dimensions DIM^A) from the vocabulary, for 
i-1, . . . , N, and for each image B from an image 
database, to apply rules R, from the grammar to obtain 
corresponding distance measures dist/A, B), where 
dist^A, B) is the distance between the images A and B 
according to the rule i. 

* * * * * 
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